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Abstract —In this article, a stochastic gradient based online 
learning algorithm for Extreme Learning Machines (ELM) is 
developed (SG-ELM). A stability criterion based on Lyapunov 
approach is used to prove both asymptotic stability of estima¬ 
tion error and stability in the estimated parameters suitable 
for identification of nonUnear dynamic systems. The developed 
algorithm not only guarantees stability, but also reduces the 
computational demand compared to the OS-ELM approach 
fLi| based on recursive least squares. In order to demonstrate 
the effectiveness of the algorithm on a real-world scenario, an 
advanced combustion engine identification problem is considered. 
The algorithm is applied to two case studies: An online regression 
learning for system identification of a Homogeneous Charge 
Compression Ignition (HCCI) Engine and an online classification 
learning (with class imbalance) for Identifying the dynamic 
operating envelope of the HCCI Engine. The results Indicate 
that the accuracy of the proposed SG-ELM is comparable to 
that of the state-of-the-art but adds stabiUty and a reduction in 
computational effort. 

Index Terms —Stochastic Gradient, Extreme Learning Ma¬ 
chines, Online Learning, Online Classification, System Identifi¬ 
cation, Class Imbalance Learning, Lyapnnov Stability, Homoge¬ 
neous Charge Compression Ignition, Operating Envelope Model, 
Misfire Prediction, Engine Diagnostics, Engine Control. 


I. Introduction 

Homogeneous Charge Compression Ignition (HCCI) En¬ 
gines are of significant interest to the automotive industry 
owing to their ability to reduce emissions and fuel con¬ 
sumption significantly compared to traditional spark ignition 
and compression ignition engines 0, El. The highly 
efficient operation of HCCI is achieved using advanced control 
strategies such as exhaust gas recirculation (EGR) 0, variable 
valve timings (VVT) intake charge heating 0 among 
others. Such complex manipulations of the system results in a 
highly nonlinear behavior 0 with a narrow region of stable 
operation 0, Qol. 

Control of HCCI combustion is a major challenge for auto¬ 
motive application. Several factors contribute to the challenge 
including the absence of a direct trigger for combustion, 
narrow operating range and high sensitivity to disturbances. 
To address the issue, advanced model based control methods 
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are common where the control actions are often made using a 
predictive model of the engine im, 0, Ha. As alternatives 
to physics based modeling that might involve significant 
development time and associated costs, data based approaches 
were introduced ns, na, m that takes advantage of the 
extensive experimentation that is performed during the engine 
calibration process. 

The key requirement for a model based control of an HCCI 
engine is the ability to accurately predict the engine state 
variables for several operating cycles ahead of time, so that 
a control action with a known effect can be applied to the 
engine. Eurther, in order to be vigilant against the engine 
drifting towards instabilities such as misfire, ringing, knock, 
etc ca, El, the operating limits of the engine particularly 
in transients, is required. In order to develop controllers and 
operate the engine in a stable manner, both models of the 
engine operating envelope as well as models of engine state 
variables are necessary. 

The state variables of an engine can be defined as the 
fundamental quantities that represent the state of operation of 
the engine. As a consequence, these variables also influence 
the performance of the engine such as fuel efficiency, emis¬ 
sions and stability, and are required to be monitored/regulated. 
Eor this work, the net mean effective pressure (NMEP) and 
the phasing of combustion event (CA50) with respect to the 
engine’s top dead center El are considered representative 
states that represents the quality of engine operation. More 
fundamental state variables such as in-cylinder temperature, 
pressure, chemical composition of combustion mixtures can 
be considered but these variables cannot be measured feasibly 
on a production engine. 

The HCCI engine has a narrow region of stable operation 
defined by an operating envelope. The dynamic operating 
envelope of an engine can be defined as a stable region in 
the operating space of the engine. The significance of the 
operating envelope and data based modeling approaches are 
recently introduced by the authors El- Knowledge of the 
operating envelope is crucial for designing efficient controllers 
for the following reasons. The developer can get insights on 
the actuator extremes ifTSl . such as the minimum and maxi¬ 
mum quantity of fuel to be injected into the engine at a given 
speed and load conditions. The actuator extremes can then be 
used to enforce constraints on the control variables for desired 
engine operation. Eurthermore, an operating envelope model 
could enable designing efficient engine diagnostic systems 
based on predictive analytics. Eor instance, a misfire event is 
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a lack of combustion which produces no work output from the 
engine. The mishred fuel enters the exhaust system increasing 
emissions of hydrocarbon and carbon monoxide ED, Ho). 
When the engine mishres, pollutant levels may be higher than 
normal. Real time monitoring of the exhaust emission control 
system and engine mishre detection are essential to meet 
requirements on On-Board Diagnostic (OBD) regulations. The 
envelope model can be used to alarm the onboard diagnostics 
if the engine is about to mishre owing to changes in system 
or operating conditions. 

Data based modeling approaches for the HCCI engine state 
variables and dynamic operating envelope were demonstrated 
using neural networks HH, support vector machines m, 
extreme learning machines ETIl by the authors. However, the 
previous research considered an offline approach where the 
data collected from engine experiments were taken offline and 
models were developed using computer workstations that had 
high processing and memory. However, a key requirement 
in advancing the capabilities of data based HCCI modeling 
task is to perform online learning for the following reasons. 
The models developed offline are valid only in the controlled 
experimental conditions. For instance, the experiments are 
performed at a controlled ambient temperature, pressure and 
humidity conditions. As a result, the models developed are 
valid for the specihed conditions and a when the models are 
implemented, for instance, on a vehicle, the expectation is 
that the model works on a wide range of climatic conditions 
that the vehicle is exposed to, possibly conditions that were 
not experimented. Hence, an online adaptation to learn the 
behavior of the system at new/unfamiliar situations is required. 
Also, since the offline models are developed directly from ex¬ 
perimental data, they may perform poorly in certain operating 
regions where the density of experimental data is low. As more 
data becomes available in such regions, an online mechanism 
can be used to adapt to such data. In addition, the engine 
produces high velocity streaming data; operating at about 2500 
revolutions per minute, an in-cylinder pressure sensor can pro¬ 
duce about 1.8 million data observations per day. It becomes 
infeasible to store this data for offline model development. 
Thus, an online learning framework that processes every data 
observation, updates the model and throws away the data is 
required for advanced engines like HCCI. 

Online learning algorithms exist for linear and nonlin¬ 
ear models. For combustion engine applications, algorithms 
involving linear models are common in adaptive control. 
However, for a system like the HCCI engine, linear models 
may be insufflcient to capture the complex dynamics and 
the authors showed that nonlinear identihcation models out¬ 
performed linear models, particularly for predicting several 
steps ahead in time d. While numerous techniques for 
online learning do exist in machine learning literature, a 
complete survey is beyond the scope of this article. The 
recent paper on online sequential extreme learning machines 
(OS-ELM) Q surveys popular online learning algorithms in 
the context of classihcation and regression and develops an 
efflcient algorithm based on recursive least squares. The OS- 
ELM algorithm seems to be the present state of the art for 
classihcation/regression problems achieving high generaliza¬ 


tion accuracies, global optimal solution and in quick time. 

In spite of its known advantages, an over-parameterized 
ELM suffers from ill-conditioning problem when a recursive 
least squares type update is performed (as in OS-ELM). This 
sometimes results in poor regularization behavior m, d, 
im, 1251 . which leads to an unbounded growth of the model 
parameters and unbounded model predictions. If decisions are 
made as the model is updated (as in case of adaptive control 
for instance 1261), it is vital for the parameter estimation to be 
stable so that model based decisions are valid. Hence a guaran¬ 
tee of stability and boundedness is of extreme importance. To 
address this issue, a stable online learning algorithm based on 
stochastic gradient descent is developed and stability is proved 
using Lyapunov stability theory. Although Lyapunov based 
approaches are popular in control theory, notable prior work 
for online learning include a Lyapunov approach applied for 
identihcation using radial basis function neural networks 113 
and GLO-MAP models 12^ . The parameter update in such 
methods involves complex gradient calculation in real time or 
hrst estimating a linear model and then estimating a nonlinear 
difference using orthonormal polynomial basis functions. The 
approach proposed in this paper aims to retain the simplicity 
and generalization power of ELM and OS-ELM algorithms, 
and introduce stability in parameter estimation so that such 
online models could be used for real-time control purposes. 

The objective of this article is to develop a stable online 
learning algorithm for ELM models using stochastic gradients 
and apply to the HCCI engine modeling problem. The contri¬ 
butions of the paper are as follows. A novel online learning 
algorithm based on stochastic gradient descent for extreme 
learning machines is developed. The stability of parameter 
estimation for dynamic systems is proved using a Lyapunov 
stbility approach. The application of the stochastic gradient 
ELM algorithm to the complex HCCI engine identihcation is 
the hrst application (to our best knowledge) of online learning 
schemes to HCCI engines. This includes both the online state 
estimation problem as well as the online operating boundary 
estimation problem. 

The remainder of the article is organized as follows. The 
ELM modeling approach is described in Section |II] along with 
algorithm details on batch (offline) learning as well as the 
present state of the art - the OS-ELM. In Section III the 


stochastic gradient based ELM algorithm is derived along with 
stability proof. In Section IV the background on HCCI engine 
and experimentation are discussed. Sections |V]and|VT] cover 
the discussions on the application of the SG-ELM algorithm 
on the two applications, followed by conclusions in Section 

NE 


II. Extreme Learning Machines 

Extreme Learning Machine (ELM) is an emerging learning 
paradigm for multi-class classihcation and regression problems 
||29]| . OOl . An advantage of the ELM method is that the train¬ 
ing speed is extremely fast, thanks to the random assignment 
of input layer parameters which do not require adaptation to 
the data. In such a setup, the output layer parameters can be 
analytically determined using a least squares approach. Some 
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of the attractive features of ELM ll29l include the universal 
approximation capability of ELM, the convex optimization 
problem of ELM resulting in the smallest training error 
without getting trapped in local minima, closed form solution 
of ELM eliminating iterative training and better generalization 
capability of ELM ISO). 

Consider the following data set 

{{xi,yx),...,{xN,yN)} & {X,y), ( 1 ) 

where N denotes the number of training samples, X denotes 
the space of the input features and y denotes labels whose 
nature differentiate the learning problem in hand. Eor instance, 
if y takes integer values {1,2,3,..} then the problem is referred 
to as classification and if y takes real values, it becomes 
a regression problem. ELMs are well suited for solving 
both regression and classification problems faster than state 
of the art algorithms 1^ . A further distinction could be 
made depending on the availability of training data during 
the learning process, as offline learning (or batch learning) 
and online learning (or sequential learning). Offline learning 
could make use of all training data simultaneously as all 
data is available to the algorithm. In addition, as the models 
are developed offline, efficient use of available computational 
resources could be made enabling offline algorithms to solve 
complex optimization problems. Typically, the accuracy of the 
modeling task takes priority over both computational demand 
and training time. On the other hand, situations where data 
is available as high velocity steams where it not feasible 
to store all data and make inference in quick time, or in 
situations where the inference is simultaneously made along 
with adaptation of model to incoming data, online learning is 
preferred. In an online learning setting, data is available one- 
by-one and needs to be processed with limited computational 
effort and storage. Eurther, inference is required to be made 
with each new available data along with the ones recorded in 
the past. In this work, the online setting is considered where a 
stable online learning algorithm is proposed that is compared 
with the offline approach and existing online learning method. 

A. Batch (Offline) ELM 

When the entire training data is available and a model is 
required to be learned using all the training data, batch learning 
is adopted. In this case, the ELM algorithm involves solving 
the following optimization problem 

rmn {\\HW-Y\f + \\\Wf} (2) 

H'^ = ^j;{Wjx{k)X hr) (3) 

where A represents the regularization coefficient, Y represents 
the vector of outputs or targets, ip represents the hidden layer 
activation function (sigmoidal, sinusoidal, radial basis etc 1301) 
and Wr G S represents the input and 

output layer parameters respectively. Here, n represents the 
dimension of inputs x(fc), Uh represents the number of hidden 
neurons of the ELM model, H represents the hidden layer 
output matrix and y^ represents the dimension of outputs 
Y. The matrix Wr consists of randomly assigned elements 


that maps the input vector to a high dimensional feature 
space while hr G K"'* is a bias component assigned in a 
random manner similar to Wr- The number of hidden neurons 
determines the expressive power of the transformed feature 
space. The elements can be assigned based on any continuous 
random distribution |[30l and remains hxed during the learning 
process. Hence the training reduces to a single step calculation 
given by equation 0- The ELM decision hypothesis can be 
expressed as in equation (|^ for classification and equation (|^ 
for regression. It should be noted that the hidden layer and the 
corresponding activation functions give a nonlinear mapping 
of the data, which if eliminated, becomes a linear least squares 
(Linear LS) model and is considered as one of the baseline 
models in this study. 


w* = {h'^H + XI) ^ h'^y 

(4) 

f{x) = sgn {w'^[ip{wj’x -+- hr)]) . 

(5) 

/(a;) = W'^[ip{WrX hr)] 

(6) 


Since training involves a linear least squares solution with 
a convex objective function, the solution obtained by ELM is 
extremely fast and is a global optimum for the chosen nt, 
Wr and hr- The above formulation for classification 0, is 
not designed to handle imbalanced or skewed data sets. As a 
modihcation to weigh the minority class data more, a simple 
weighting method can be incorporated in the ELM objective 
function 0 as 

min {{HW - Y)^T{HW - Y) + XW'^W} (7) 

7i 0 . . 0 

0 72 • . 0 

. . . 0 

_ 0 0 . . 7Ar _ 

1 majority class data 

r X fs minority class data 

where T represents the weight matrix, r represents the ratio of 
number of majority class data to number minority class data 
and fs represents a scaling factor to be tuned for a given data 
set ifTSl . This results in the training step given by equation 0 
and the decision hypothesis takes the same form as in equation 
0 : 

w* = + A/) ff'^rY. (9) 

B. Online Sequential ELM (OS-ELM) 

The OS-ELM ffl is a recursive version of the batch ELM 
algorithm. This version of the algorithm is used for online 
learning purposes where data is processed one-by-one or 
chunk-by-chunk and the model parameters are updated after 
which the used data is not required to be stored. In this 
process, training involves two steps - initialization step and 
sequential learning step. During the initialization step, a set of 
data observations (Nq) are required to initialize the iJo and 
Wq by solving the following optimization problem 

min {117101^0-^011" +AlllEof} (10) 
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Ho = [g{W^xo + K)f G 


The solution Wq is given by 

Wo = Ko^H^Yo (12) 


where Kq = Hq Hq + XI. Suppose given another new data 
xi, the problem becomes 


min 

Wi 


Ho 

Hi 


Wi 



(13) 


The solution can be derived as 


Wi = Wo + K^^H[ (Yi - HiWo) 
Ki = Ko + H^Hi. 


Based on the above, a generalized recursive algorithm for 
updating the least-squares solution can be computed as follows 

Mk+i = Mk — Hk+iMk 

(14) 

Wu+i = Wu + - Hu+iWk) (15) 

where M represents the covariance of the parameter estimate. 


III. Stochastic Gradient Based ELM Algorithm 

In this section, the proposed online learning algorithm using 
stochastic gradient descent (SGD) is developed for the extreme 
learning machine models for both classification and regression 
problems. SGD methods have been popular for several decades 
for performing online learning but with severe limitations on 
poor optimization and slow convergence rates. However, only 
recently, the asymptotic behavior of SGD methods has been 
analyzed indicating that SGD methods can be very powerful 
for learning large data sets ED, EH. SGD based algorithms 
have been developed for Adaline networks, perceptron models, 
K-means, SVM and Lasso ED- In this work, the SGD 
algorithm is developed for extreme learning machines showing 
good potential for online learning of high velocity (streaming) 
data. 

The justification of SGD based algorithms in machine 
learning can be briefly discussed as follows. In any learning 
problem, three types of errors are encountered, namely the 
approximation error, the estimation error and the optimization 
error ED, and the expected risk E^xpif) and the empirical 
risk Egmp for a supervised learning problemd can be given by 

Eexpif) = J l[f{x),y)dP{x,y) 

1 ^ 

Eempif) = ^ Vi) 

i=l 

Let /* = argminj:Egxp{f) be the best possible prediction 
function. In practice, the prediction function is chosen from 
a family of parametric functions denoted by E. Let = 
argmmj:^jrEexp{f) be the best prediction function chosen 
from a parameterized family of functions E. When a training 
data set becomes available, the empirical risk becomes a 
proxy for the expected risk for the learning problem E3- 
Let = argmin^gjr£'emp(/) be the solution that minimizes 


the empirical risk. However, the global solution is not typi¬ 
cally obtained because of computational limitations and hence 
the solution of the learning problem is reduced to finding 

fjr = argmin^gjpL 

emp if)- 

Using the above setup, the approximation error (Eapp) is 
the error introduced in approximating the true function space 
with a family of functions E, the estimation error {Egst) is 
the error introduced in optimizing over Eempif) instead of 
Eexpif), the optimization error (Eopt) is the error induced as 
a result of stopping the optimization to fjr. The total error 
Etot can be expressed as 

Eapp = Eexpif) - Eexpif^) 

Eest = Eexpiff - Eempiff 
Eopt — Eempiff Eempiff 
Etot — Eapp “1“ Eest E Eopt 

The following observations are taken from the asymptotic 
analysis of SGD algorithms Ell . ED- 

1) The empirical risk Eempif) i® only a surrogate for the 
expected risk Eexpif) ^ticl hence an increased effort to 
minimize Eopt may not translate to better learning. In 
fact, if Eopt is very low, there is a good chance that the 
prediction function will over-fit the training data. 

2) SGD are worst optimization algorithms (in terms of 
reducing Eopt) but they minimize the expected risk 
relatively quickly. Therefore, in the large scale setup, 
when the limiting factor is computational time rather 
than the number of examples, SGD algorithms perform 
asymptotically better. 

3) SGD results in a faster convergence when the loss 
function has strong convexity properties. 

The last observation is key in developing the algorithm 
based on ELM models. The ELM models have a squared loss 
function and when the hidden neurons are randomly assigned 
and fixed, the training translates to solving a convex optimiza¬ 
tion problem. Hence the ELM model can be a good candidate 
to perform SGD type learning and hence the motivation for 
this study. The SGD based algorithm can be derived for the 
ELM models as follows. 

A. Algorithm Formulation 

Let (Xi,yi) where i = 1,2, ..N be the streaming data in 
consideration. The data can be considered to be available to the 
algorithm from a one-by-one continuous stream or artificially 
sampled one-by-one from a very large data set. Let the ELM 
empirical risk be defined as follows 

1 ^ 

J{W) = inin-^ll?/, 

w z 

2 = 1 

= + •■ + \\\yN - 4>NWf^ 

= min{Ji{W) +J2iW) ++JniW)} . (16) 

w 

where W G G G is the hidden 

layer output (see in equation (|^). If an error G 










IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. XX, NO. XX, XXXX 2014 


5 


can be defined as {yi — (j)JW), the learning objective for a 
data observation i can be given by 

1 2^ 

= 

= (j)i4>JW - cl)iyi = cl)i{(l)JW - yi) 

= -(kiei- (17) 

In a regular gradient descent (GD) algorithm, the gradient of 
J{W) is used to update the model parameters as follows. 

dJi dJ2 OJn 

~dw 

= — 4>2e2 — — (j^N^N 

dJ 

= W,-Tsg^ 

= 1^/c + rsG(</i>iei) + •• + r5G(</i>Afeiv) (18) 

where k is the iteration count, S 

mathbbR^^^^’' represents the step size or update gain matrix 
for the GD algorithm. 

It can be seen from equation ( [T8] l that the parameter matrix 
W is updated based on gradients calculated from all the 
available examples. If the number of data observations is large, 
the gradient calculation can take enormous computational 
effort. The stochastic gradient descent algorithm considers one 
example at a time and updates W based on gradients calculated 
from (xi,yi) as shown in 

= W',+rsG(0*e,). (19) 

From equation ( fTS] !, it is clear that the optimal kF is a function 
of gradients calculated from all the examples. As a result, 
as more data becomes available, W converges close to its 
optimal value in SGD algorithm. Processing data one-by-one 
significantly reduces the computational requirement and the 
algorithm is scalable to large data sets. More importantly, for 
the online learning of HCCI engine dynamic considered in this 
work, the SGD algorithm becomes a strong candidate. 

In order to handle class imbalance learning, the algorithm 
in can be modihed by weighting the minority class data 
more. The modified algorithm can be expressed as 

Wi+i = Wi +Tirnb^SG{(l>iei) (20) 

where Timb = t x fs, r and fs represent the imbalance ratio 
(a running count of majority class data to minority class data 
until that instant) and the scaling factor that needs to be tuned 
to obtain tradeoffs between high false positives and missed 
detections for a given application. 

B. Stability Analysis 

The stability analysis of the SGD based ELM algorithm can 
be derived as follows. The ELM structure makes the analysis 
simple and similar to that of a linear gradient based algorithm 


dJ 

dW 

dJ 

^ dW 

Wk+i 


MW) 


dW 


The instantaneous prediction error (Here the error e and 
output y are transposed as opposed to their previous definition 
in Section III-A for ease of derivations) can be expressed in 
terms of parametric error (IF = kF* — kF) as 


Ci = yi - W'^(j)i 

= ( 21 ) 


where kF* represents true model parameters. Further, the 
parametric error dynamics can be obtained as follows. 

lF,+i = lF*-kF,+i 

= w,-w,-rsGMf 

= W- FsgMI’ (22) 


Consider the following positive definite, decrescent and 
radially unbounded 051 Lyapunov function V 

V{W) = triW^Tg^W) (23) 


where tr represents the trace of a matrix. 


AV{W) = F(kF,+i) - F(1F,) 

= tr{W^^,rshW+i) - triWMshW) 

= tr{{W - TsGMD^^shiW - TsgMI)) 

-triWMshW) 

= tr{-2W)fMl + e^(t)'f'^SG(t>ie^) 

= tr{-2e^ej + eiMVsGMl) 

= -‘2ejei +eJei(j)jTsG(t>i 
= - 2 eJ ei + ej (j)jTsGMi 
= -ejMsGe^ (24) 


where Msg = 2 —F sG4‘i- It can be seen that Fi+i — F < 0 
if Msg > 0 or 2 - > 0 or 


0 < Amaz;(rsG) < 2 (25) 


When ( |25| l is satisfied, F(1F) > 0 is non-increasing in i and 
the limit 


lim ViW) = Foo 

k—¥oo 

(26) 

exists. From (|24li. 



F+i -14 

= -ejMsGGi 


E(L+i-14) 

= -'^ejMsGSi 


i^O 

2=0 


^ ef Msgs-i 

= 1^(0) - ^oo < OO 

(27) 

i=0 


(28) 

Also, 



2=0 

MsG€-t < oo 

2=0 

(29) 


when Msg > 7 or when 


^max (Tsg) < 1 . 


(30) 
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Hence, when is satisfied, G L 2 . From ([T9J, - 

Wi) G L 2 n Loo. Using discrete time Barbalat’s lemma 1^ . 


lim Ci = 0 

i—¥oo 

lim Wi+i = Wi 
{—>■00 


(31) 

(32) 


Hence, the SGD learning law in ( [T^ guarantees that the 
estimated output yi converges to the actual output yi and the 
model parameters W converge to some constant values. The 
parameters converge to the true parameters only under 
conditions of persistence of excitation llT5l in input signals of 
the system (amplitude and frequency richness of x). Further, 
using boundedness of Vi, Ci G Loo which guarantees that the 
online model predictions are bounded as long as the system 
output is bounded. As the error between the true model and 
the estimation model converges to zero, the estimation model 
becomes a one-step ahead predictive model of the nonlinear 
system. The evaluation of the SG-ELM algorithm is performed 
using application to a complex HCCI engine identification 
problem. 


IV. Homogeneous Charge Compression Ignition 
Engine 

The algorithms discussed in Section are applied to 
streaming sensory data from a gasoline HCCI engine for 
demonstrating an online learning framework for HCCI engine 
modeling. The engine specifications are listed in Table 
A schematic of the experimental setup and instrumentation is 
shown in Eig. HCCI is achieved by auto-ignition of the gas 
mixture in the cylinder. The fuel is injected early in the intake 
stroke and given sufficient time to mix with air forming a 
homogeneous mixture. A large fraction of exhaust gas from the 
previous cycle is retained to elevate the temperature and hence 
the reaction rates of the fuel and air mixture. The variable 
valve timing capability of the engine enables trapping suitable 
quantities of exhaust gas in the cylinder. 

TABLE I: Specifications of the experimental HCCI engine 


Engine Type 

4-stroke In-line 

Fuel 

Gasoline 

Displacement 

2.0 L 

Bore/Stroke 

86/86 mm 

Compression Ratio 

11.25:1 

Injection Type 

Direct Injection 

Valvetrain 

Variable Valve Timing with 
hydraulic cam phaser having 

119 degree constant duration 
defined at 0.25mm lift, 3.5mm peak 
lift and 50 degree crank angle 
phasing authority 

HCCI strategy 

Exhaust recompression 
using negative valve overlap 


The engine can be controlled using precalculated inputs 
such as injected fuel mass (EM in mg/cyc), crank angle at 
intake valve opening (IVO), crank angle at exhaust valve 
closing (EVC), crank angle at start of fuel injection (SOI). 
The valve events are measured in degrees after exhaust top 
dead center (deg eTDC) while SOI is measured in degrees 
after combustion top dead center (deg cTDC). Other important 


physical variables that influence the performance of HCCI 
combustion include intake manifold temperature Tin, intake 
manifold pressure Pin, mass flow rate of air at intake mi„, 
exhaust gas temperature T^x, exhaust manifold pressure Pgx, 
coolant temperature T^, fuel to air ratio (EA) etc. The en¬ 
gine performance metrics are given by combustion phasing 
indicated by the crank angle at 50% mass fraction burned 
(CA50), combustion work output given by net indicated mean 
effective pressure (NMEP, sometimes abbreviated as IMEP). 
The combustion features calculated using in-cylinder pressure 
such as CA50, NMEP are determined from the high speed in¬ 
cylinder pressure measurements. Eor further reading on HCCI 
combustion and related variables, please refer 137]. 

A. Experiment Design 

In order to identify both models for HCCI state variables as 
well as models for dynamic operating boundary in transient 
operation, appropriate experiment design to obtain transient 
data from the engine is required. The modeled variables 
such as engine states and operating envelope are dynamic 
variables and in order to capture both transient and steady 
state behavior, a set of dynamic experiments is conducted at 
constant rotational speeds and naturally aspirated conditions 
(no supercharging/turbocharging) by varying PM, IVO, EVC 
and SOI in a uniformly random manner. Every input step 
involves the engine making a transition between two set con¬ 
ditions and the transition (transients or dynamics) is recorded 
as temporal data. In order to capture several such transients, 
an amplitude modulated pseudo-random binary sequence (A- 
PRBS) has been used to design the excitation signals. A- 
PRBS enables exciting the engine at different amplitudes and 
frequencies suitable for the identification problem considered 
in this work. The data is sampled using the AVL Indiset 
acquisition system where in-cylinder pressure is sensed every 
crank angle using which the combustion features NMEP, CA50 
are determined on a per-combustion cycle basis. More details 
on HCCI combustion and experiments can be found in ifTSl . 

d, Ea 

B. HCCI Instabilities 

A subset of the data collected from the engine is shown 
in Pig. 1^ where it can be observed that for some combina¬ 
tions of the inputs (left figures), the HCCI engine misfires 
(seen in the right figures where NMEP drops below 0 bar). 
HCCI operation is limited by several phenomena that lead to 
undesirable engine behavior. As described in lf38l . the HCCI 
operating range is conceptually constrained to a small region 
of permissible unburned (pre-combustion) and burned (post¬ 
combustion) charge temperature states. As previously noted, 
sufficiently high unburned gas temperatures are required to 
achieve ignition in the HCCI operating range without which 
complete misfire will occur. If the resulting combustion cannot 
achieve sufficiently high burned gas temperatures, commonly 
occurring in conditions with low fuel to diluent ratios or late 
combustion phasing, various degrees of quenching can occur 
resulting in reduced work output and increased hydrocarbon 
and carbon monoxide emissions. Under some conditions, this 











IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS, VOL. XX, NO. XX, XXXX 2014 


7 


13 |-^Tirin| | 11 |^Tg~n| 


Throttle ^ Intake Manifold 








Coolant 

Of' 

8. 

Fuel injector 

mbar 

thermocouple 

L 




Exhaust manifold 

Of' 

9. 

Cam phaser 

- 

thermocouple 

L 

10, 

Cylinder 1 Runner 


Exhaust manifold 

mbar 

11. 

T and P 

mbar 

pressure transducer 

12. 

Intake manifold 



Post-turbine 


thennocouple 

thermocouple 


13. 

Intake manifold 

mbar 

Lambda sensor 

- 

pressure transducer 

Air mass flow 

sensor 

kg/h 

14. 

RPM sensor 

RPM 

15. 

In-cylinder pressure 

bar 



Spark plug 

- 

transducer 


Fig. 1: A schematic of the HCCI engine setup and instrumentation (only relevant instrumentation shown). 


may lead to high cyclic variation due to the positive feedback 
loop existing through the trapped residual gas ED, ini. 
Operation with high burned gas temperature, although stable 
and commonly reached at higher fueling rates where the fuel 
to diluent ratio is also high, yields high heat release and thus 
pressure rise rates that may pose challenges for engine noise 
and durability constraints. A discussion of the temperatures at 
which these phenomena occur may be found in lIMl . 

C. Learning The HCCI Engine Data 

In the HCCI modeling problem, both the inputs and the 
outputs of the engine are available as sensor measurements and 
hence supervised learning can be employed. The HCCI engine 
is a nonlinear dynamic system and sensor measurements 
represent discrete time sequences. The input-output behavior 
can be modeled using a nonlinear auto regressive model with 
exogenous input (NARX) ll39l as follows 

y{k) = fNARx[u{k - 1), ..,u{k - n„), 

y{k-l),..,y{k-ny)\ (33) 

where u(k) G and y(k) G represent the inputs and 
outputs of the system respectively, k represents the discrete 


time index, fNARxi-) represents the nonlinear function map¬ 
ping specified by the model, n„, Uy represent the number of 
past input and output samples required (order of the system) 
while Ud and yd represent the dimension of inputs and outputs 
respectively. Let x represent the augmented input vector ob¬ 
tained by appending the input and output measurements from 
the system. 

X = [u{k - 1),.., u(k - nu),y{k - 1),.., y{k - n^)]^ (34) 

The input measurement sequence can be converted to the form 
of training data 

{{xi,yi),...,{xN,yN)} G {X,y) (35) 

where N denotes the number of training samples, X denotes 
the space of the input features (Here X = 

3^ = M for regression and y — {-fl,—1} for a binary 
classification). The above conversion of system measurements 
to training data is a natural dehnition for a series-parallel 
model architecture and the models can be used for a one-step 
ahead prediction (OSAP) i.e., given a set of measurements 
until time index k, the model predicts the output at time 
k + 1 (see equation ([36]l). A parallel architecture on the other 
hand can be used to perform multiple step ahead predictions 
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Fig. 2: A subset of the HCCI engine experimental data show¬ 
ing A-PRBS inputs and engine outputs. The misfire regions are 
shown in dotted rectangles. The data is indexed by combustion 
cycles. 


(MSAP) by feeding back the predictions of the OSAP model 
in a recurrent manner (see equation (|J7]l). The series-parallel 
and parallel architectures are well explained in HOl . 

y{k -I- 1 ) = fNARx[u{k), ..,u{k - n„ -f l),?/(fc), 

..,y{k-ny + l)] (36) 

y{k-\-Tlpred') — fN ARx{R{k~\~Tlpred l),..,u(/c Ru'^R^pred): 

y{k “f Rpred l)j Ry “b Rpred^\ (37) 

The OSAP model is used for training as existing simple 
training algorithms can be used and once the model becomes 
accurate for OSAP, it can be converted to a MSAP model in 
a straightforward manner. The MSAP model can be used for 
making long term predictions useful for predictive control 

ED, ED- 

V. Application Case Study 1: Online regression 

LEARNING EOR SYSTEM IDENTIEICATION OE AN HCCI 
Engine. 

As mentioned earlier, a key requirement for model based 
control of the HCCI engine is the ability to accurately predict 
the engine state variables for several operating cycles ahead 
of time, so that a control action with a known impact can be 
applied to the engine. The state variables of an engine are 
the fundamental quantities that represent the engine’s state of 
operation. As a consequence, these variables also influence the 
performance of the engine such as fuel efficiency, emissions 
and stability, and are required to be monitored/regulated. In 
this section, the NMEP and CA50 are considered indicative 
of engine state variables and are estimated based on control 
inputs alone, so that the resulting models can be used for 


predictive control. This section details the experiments, model 
training and validation of the identified models. 

Eor the HCCI control oriented modeling, an online regres¬ 
sion learning framework is developed. In contrast to the exist¬ 
ing linear system identification 0, a nonlinear identification 
is employed. Typical features of nonlinear identification such 
as slow convergence and complex parameter update make 
existing methods practically unsuitable for complex systems. 
In this work, these shortcomings are eliminated making the 
approach suitable for the complex HCCI engine problem in 
hand. 

A. Model Structure and Evaluation Metric 

Eor the purpose of demonstration, the variables NMEP 
and CA50 are considered as outputs whereas the control 
variables such as fueling (EM), exhaust valve closing (EVC) 
and fuel injection timing (SOI) are considered inputs. Tran¬ 
sient data from the HCCI engine at a constant speed of 
1800 RPM and naturally aspirated conditions is used. A 
NARX model as shown in section IIV-CI is considered where 
u = [FM EVC SOI]^ and y = [NMEP C'ASO]'^, Ru 
and Ry chosen as 1 (tuned by trial and error). The nonlinear 
model approximating Jnarx is initialized to an extreme 
learning machine model with random input layer weights and 
random values for the covariance matrices and output layer 
weights. Pour different models are considered including the 
state of the art OS-ELM algorithm, the proposed SG-ELM 
algorithm, a baseline offline (batch) ELM (O-ELM) and a 
baseline linear system identification model. The purpose of 
the baseline offline ELM algorithm is to evaluate the efficiency 
of the online learning models in learning the HCCI behavior 
completely as an offline ELM model would do. The offline 
ELM model is expected to produce an accurate model as it 
has sufficient time, computation and utilization of all training 
data simultaneously to learn the HCCI behavior sufficiently 
well. The purpose of the linear baseline model is to justify 
the use of a nonlinear model for HCCI dynamics. 

All the nonlinear models consist of 100 hidden units with 
fixed randomized input layer parameters. About 11000 cycles 
of data is considered one-by-one as it is sampled by the engine 
ECU and model parameters updated in a sequential manner. 
After the training phase, the parameter update is switched 
off and the models are evaluated for the next 5100 cycles 
of data for one step ahead predictions. Purther, to evaluate if 
the learned models represent the actual HCCI dynamics, the 
multi-step ahead prediction of the models are compared using 
about 600 cycles of data. It should be noted that both the one- 
step ahead and multi-step ahead evaluations were done using 
data unseen during the training phase. 

The parameters of each of the models are tuned to accurately 
represent the given dataset. As recommended by OS-ELM 
m, about 800 cycles of data was used for initializing the 
output layer parameters Wq and covariance matrix Mg (see 
equations ( [l4| ) and (fTS])). The initialization was performed 
using the batch ELM algorithm ll^ . In order to have a fair 
comparison, the Wq is used as an initial condition for both OS- 
ELM and SG-ELM. The only parameter of SG-ELM, namely 
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the gradient step size was tuned to be Fsg = 0.0008 Jioo for 
best accuracy. This was determined using trial and error and 
the value of Fsc had a signihcant impact on the prediction 
accuracy. A detailed analysis on the robustness of Fgc is 
outside the scope of this paper and will be considered for 
future. 

The performance of the models are measured using normal¬ 
ized root mean squared error (RMSE) given by 


RAISE = 


\ 


-| »£• ya 

i=l 3 = 1 


(38) 


where both y* and y* are normalized to lie between -1 and 
H-1. 


B. Results and Discussion 

On performing online learning, it can be observed from Fig. 
I^that the parameters of OS-ELM grow more aggressively as 
compared to the SG-ELM. In spite of both models having the 
same initial conditions, the step size parameter Fsg for SG- 
ELM gives additional control over the parameter growth and 


keep them bounded as proved in section III-B On the other 
hand, OS-ELM doesn’t have any control over the parameter 
evolution. It is governed by the evolution of the co-variance 
matrix M ([T4|. It is expected that the co-variance matrix M 
would add stability to the parameter evolution but in practice, 
it tends to be more aggressive leading to potential instabilities 
as reported by E2\ . Il23l . Il24l . Il25l . As a consequence, the 
parameter values for SG-ELM remain small compared to the 
OS-ELM (the norm of estimated parameters for OS-ELM is 
16.64 and SG-ELM is 3.71). This has a signihcant implication 
in the statistical learning theory ||42]. A small norm of model 
parameters implies a simpler model which results in good 
generalization. Although this effect is slightly rehected in 
the results summarized in prediction results summarized in 
Table |n] (see MSAP RMSE for SG-ELM being the lowest), it 
is not signihcantly better for this problem possibly because 
of incomplete convergence. The value of F^g to be 
tuned correctly along with sufficient training data in order to 
ensure parameter convergence. Ultimately, the online learning 
mechanism is aimed to run along with the engine and hence the 
slow convergence may not be an issue in a vehicle application. 


TABLE II: Performance comparison of OS-ELM and SG-ELM 
for the HCCI online regression learning problem. A baseline 
linear model and an offiine trained ELM model (O-ELM) are 
also included for comparison. 



Training 
Time in s 

OSAP 

RMSE 

MSAP 

RMSE 

Linear 

0.3523 

0.2004 

0.1664 

OS-ELM 

3.3812 

0.0957 

0.1024 

SG-ELM 

0.7269 

0.1047 

0.0939 

O-ELM 

- 

0.1015 

0.1003 


The prediction results as well as training time for the online 
models are compared in Table |n] It can be observed that the 
computational time for SG-ELM is signihcantly less (about 4.6 
times) compared to OS-ELM indicating the SG-ELM features 
a faster learning. The reduction in computation is expected to 


be more pronounced as the dimension and complexity of the 
data increase. It could be seen from Table [11] that the one-step 
ahead prediction accuracies (OSAP RMSE) of the nonlinear 
models are similar with OS-ELM winning marginally. On 
the other hand, the multi-step prediction accuracies (MSAP 
RMSE) are similar for the nonlinear models with SG-ELM 
performing marginally better. The MSAP accuracy rehect the 
generalization performance of the model and is more crucial 
for the modeling problem as the models ultimately feed its 
prediction to a predictive control framework that requires 
accurate and robust predictions of the engine several steps 
ahead of time. From our understanding on model complexity 
and generalization error, a model that is less complex (in¬ 
dicated by minimum norm of parameters ll30ll . i33l l tend to 
generalize better, which is again demonstrated by SG-ELM. 
The performance of the linear baseline model is signihcantly 
low compared to the nonlinear models justifying adopting a 
nonlinear identihcation for the HCCI engine problem. 

The MSAP predictions of the models are summarized in 
Figures 4^pd where model predictions for NMEP and CA50 
are compared against real experimental data. Here the model 
is initialized using the experimental data at the hrst instant 
and allowed to make predictions recursively for several steps 
ahead. It can be seen that the nonlinear models outperform 
the linear model and at the same time the online learning 
models perform similar to the offline trained models indicating 
that online learning can fully identify the engine behavior 
at the operating condition where the data is collected. It 
should be noted that this task is a case of multi-input multi¬ 
output modeling which adds some limitations to the SG-ELM 
methods. When the model complexity increases, the SG-ELM 
require more excitations for convergence, as opposed to OS- 
ELM which converges more aggressively (although at the loss 
of stability). Further, the tuning of gradient step size F so 
be time-consuming for systems predicting multiple outputs 
with different noise characteristics. The OS-ELM on the other 
hand is much more elegant as there are no parameters to be 
tuned once properly initialized. 


VI. Application Case Study 2: Online 

CLASSIFICATION LEARNING (WITH CLASS IMBALANCE) 
FOR IDENTIFYING THE DYNAMIC OPERATING ENVELOPE OF 
AN HCCI Engine 

The problem considered in this case study is to develop 
a predictive model of the dynamic operating envelope of the 
HCCI engine. For developing stable model based controller for 
HCCI engines, it is necessary to prevent the engine drifting 
towards instabilities such as mishre, ringing, knock, etc na, 
ifTTll . To this end, a dynamic operating envelope of the HCCI 
engine was developed using machine learning models ns. 
However, the modeling was performed offiine. In this paper, 
an online learning framework for modeling the operating 
envelope of HCCI engine is developed using both OS-ELM 
and SG-ELM algorithms. 

In this paper, the operating envelope dehned by two com¬ 
mon HCCI unstable modes - a complete mishre and a high 
variability combustion (a more detailed description is given 
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Fig. 3: Comparison of parameter evolution for the OS-ELM and SG-ELM algorithms during online learning. A zoomed-in 
plot shows that the parameter update for OS-ELM is more aggressive compared to SG-ELM. Both OS-ELM and SG-ELM 
are initialized to the same parameters. The less aggressive and slow variation of the SG-ELM parameters along with stability 
bounds result in better regularization compared to OS-ELM. 
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1V-B|| is studied. The problem of identifying the 


in section 

HCCI operating envelope using experimental data can be 
posed as a classification problem. The engine sensor data can 
be manually labeled as stable or unstable depending on engine 
based heuristics. Eurther, the engine dynamic data consists 
of a large number of stable class data compared to unstable 
class data, which introduces an imbalance in class proportions. 
As a result, the problem can be posed as a class imbalance 
learning of a binary classification decision boundary. Eor class 
imbalance learning, a cost-sensitive approach that modifies the 
objective function of the learning system to weigh the minority 
class data more heavily, is preferred over under-sampling and 
over-sampling approaches M- 


Online learning algorithms using OS-ELM, SG-ELM are 
compared for classification performance. The above nonlinear 
models are compared against a baseline linear classification 
model and an offline trained nonlinear ELM model to make 
similar justifications as in the previous case study. The linear 
baseline model is included to justify the benefits of adopting 
a nonlinear model while the offline trained model is included 
to show the effectiveness of online algorithms in capturing the 
underlying behavior. 


A. Model Structure and Evaluation Metric 

The HCCI operating envelope is a function of the engine 
control inputs and engine physical variables such as tempera¬ 
ture, pressure, flow rate etc. Also, the envelope is a dynamic 
system and so a predictive model requires the measurement 
history up to an order of Nh- The dynamic classifier model 
can be given by 

itk+i = sgn(/(xfe)) (39) 

where sign{.) represents the sign function, yk+i indicates 
model prediction for the future cycle k + 1, /(.) can take 
any structure depending on the learning algorithm and Xk is 
given by 


Xk = [IVO, EVC, FM, SOI, r„, P,„, m„, 

Te,, Pe., Tc, FA, NMFP, CAbHf (40) 

at cycle k upto cycle k — Nh + 1. In the following sections, 
the function /(.) is learned using the available engine ex¬ 
perimental data using the two online ELM algorithms. The 
engine measurements and their time histories (defined by Xk) 
are considered inputs to the model while the stability labels 
are considered outputs. The feature vector is of dimension 
n=39 includes sensor measurements such as EM, IVO, EVC, 
SOI, Tc, T^n, Pin, rhin, Tex, Pex, NMEP, CA50 and EA 
along with Nh = 1 cycles of history (see (|40li). The engine 
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Fig. 4; Prediction results of the SG-ELM algorithm showing CA50, IMEP and one input variable (fueling) for 2 unseen data 
sets. 


experimental data is split into training and testing sets. The 
training set consists of about 14300 cycles of data processed 
one-by-one as sampled by the engine ECU. After the training 
phase, the parameter update is switched off and the models 
are evaluated for the next 6200 cycles of data for one step 
ahead classihcation. The ratio of number of majority class 
data to number minority class data (r) for the training set 
is about 4.5:1 and for the testing set is 9:1. The nonlinear 
model approximating /(.) is initialized to an extreme learning 
machine model with random input layer weights and random 
values for the covariance matrices and output layer weights. 
All the nonlinear models consist of 10 hidden units with fixed 
randomized input layer parameters. Similar to the previous 
case study, a small portion of the training data is used to 
initialize the ELM model parameters as well as the covari¬ 
ance matrix. The SG-ELM parameter Esc is tuned to be 
0.001 /lo using trial and error. A weighted classihcation 
version of the algorithms is developed to handle the class 


imbalance problem. The minority class data is weighted higher 
by r times fs where r is the imbalance ratio of the training data 
and is computed online as the ratio of the number of majority 
class to number of minority class data until that instant. 

Eor the class imbalance problem considered here, a conven¬ 
tional classiher metric like the overall misclassihcation rate 
cannot be used as it would hnd a biased classiher, i.e., it 
would hnd a classiher that ignores the minority class data. 
Eor instance, a data set that has 95% of majority class data 
(with label H-1) would achieve 95% classihcation accuracy 
by predicting all the labels to be H-1 which is obviously 
undesirable. Hence the following evaluation metric used for 
skewed data sets is considered. Let TP and TN represent 
the total number of positive and negative class data classihed 
correctly by the classiher. If and N~ represent the 
total number of positive and negative class data respectively, 
the true positive rate (TPR) and true negative rate (TNR), 
geometric mean (GM) of TPR and TNR, and the total accuracy 
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(TA) of the classifier can be defined as follows Il43]l . It should 
be noted that the total accuracy and geometric mean weights 
the accuracy of majority and minority classes equally, i.e., they 
have high values only when both classes of data are classified 
correctly. 


TPR 

TNR 

GM 

TA 


TP 

N+ 

TN 

JF 

VTPR X TNR 
0.5{TPR + TNR). 


(41) 


B. Results and Discussion 

The results of online imbalance classification can be sum¬ 
marized in Table m where computational time as well as 
classification performance can be compared. It can be observed 
that the developed classification models perform well for the 
HCCI boundary identification problem (see average accuracies 
of all models are above 80%). The problem is mildly nonlinear 
as linear models achieve similar accuracies as that of their 
nonlinear counterparts. Both OS-ELM and SG-ELM perform 
well and achieve results similar to an offline model indicating 
completeness of learning. The SG-ELM has a slight advantage 
in terms of computational efficiency. The algorithm is simple 
and requires about half of the time required to train an OS- 
ELM model. Eurther, for the considered classification problem, 
the prediction accuracy of SG-ELM is slightly better than OS- 
ELM indicating the suitability of SGD based online learning 
for the HCCI problem. A subtle advantage observed for the 
OS-ELM is that, although the combined accuracy is slightly 
inferior to that of the SG-ELM, the accuracies of the positive 
examples and negative examples are very close to each other 
indicating that the model is well balanced to predict both 
majority class as well as minority class data well. The SG- 
ELM on the other hand, in spite of fine-tuning the parameters, 
fails to achieve this. A further tuning can be done to improve 
the accuracy of a particular class of data, typically sacrificing 
some accuracy predicting the other. The predictions of the 
online SG-ELM model is shown in Fig. 

TABLE III; Performance comparison of the nonlinear models 
(OS-ELM and SG-ELM) for the online class imbalance learn¬ 
ing problem. A baseline linear model and an offline trained 
ELM model (O-ELM) are also used for comparison. 


Algorithms 

Training 
Time in s 

TPR 

TNR 

Total 

Accuracy 

GM 

Accuracy 

Linear 

0.58 

0.9982 

0.6374 

0.8178 

0.7977 

OS-ELM 

0.58 

0.8328 

0.8341 

0.8335 

0.8335 

SG-ELM 

0.30 

0.9876 

0.7707 

0.8792 

0.8725 

O-ELM 

- 

0.8265 

0.8569 

0.8417 

0.8416 


The models developed using OS-ELM and SG-ELM al¬ 
gorithms are used to make predictions on unseen engine 
inputs and class predictions are summarized in Fig. while 
quantitative results are included in Table As mentioned 
earlier, the operating envelope is a decision boundary in the 
input space within which any input operates the HCCI in 


a stable manner and any input outside the envelope might 
operate the engine in an unstable manner. The HCCI state 
variables such as NMEP, CA50 and engine sensor observations 
such as Tin,Pin,rhin,Tex,Pex,Tc at time instant k, along 
with engine control inputs such as EM, EVC, SOI at time 
instant k + 1, are given as input to the models (see (|40|)). The 
model predictions at time k + 1 are obtained. The engine’s 
actual response at time k + 1 is also recorded. A data point is 
marked in red if the model predicts the engine operation to be 
unstable (-1) while it is marked in green if the model predicts 
the data point to be stable (h- 1). In the figures, a dotted line in 
the NMEP plot indicates the misfire limit, a dotted ellipse in 
CA50 plot indicates high variability instability mode while a 
dotted rectangle indicates misclassified predictions by model. 
To understand the variation of NMEP and CA50 with changes 
in control inputs, the fueling input (abbreviated as EM) is also 
included in the plots. It should be understood that EM is not 
the only input for prediction and the signals are defined as in 
equation ( |40| l but only the fueling input is shown in the plots 
owing to space constraints. 

It can be seen from the above plots that as a whole, both 
OS-ELM and SG-ELM models classify the HCCI engine data 
fairly well in spite of the high amplitude noise inherent in the 
HCCI experimental data. The data consists of step changes in 
EM, EVC and SOI and whenever a ‘bad’ combination of inputs 
is chosen, the engine either misfires completely (see NMEP 
fall below misfire limit) or exhibits high variability combustion 
(see dotted ellipses). The goal of this work as stated previously, 
is to predict if a future HCCI combustion event is stable 
or unstable based on available measurements. The results 
summarized in Table indicates that the developed models 
indeed accomplished the goal with a reasonable accuracy. 
From Fig. it is observed that the OS-ELM has some clear 
misclassifications in predicting stable class data (see dotted 
rectangles in the plots) while this is not observed for SG- 
ELM. This is not surprising as the true positive rate of OS- 
ELM model is much lesser compared to that of SG-ELM (see 
Table [Bil l. On the other hand, the SG-ELM has an inferior 
accuracy in predicting the unstable modes but is not clearly 
evident in the data sets used in Fig. . 

VIE Conclusion 

A stochastic gradient descent based online learning algo¬ 
rithm for ELM has been developed, that guarantees stability in 
parameter estimation suitable for control purposes. Further, the 
SG-ELM demands less computation compared to the OS-ELM 
algorithm, as the covariance estimation step is eliminated. 
A stability proof is developed based on Lyapunov approach. 
However, the SG-ELM algorithm might involve tedious tuning 
of step-size parameter as well as suffer from slow convergence. 

The SG-ELM and OS-ELM algorithms are applied to 
develop models for state variables and dynamic operating 
envelope of a HCCI engine to assist in model based control. 
The results from this article suggest that good generalization 
performance can be achieved using both OS-ELM and SG- 
ELM methods but the SG-ELM might have an advantage in 
terms of stability, crucial for designing robust control systems. 
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Fig. 5: Classification results of OS-ELM and SG-ELM models showing CA50, IMEP and one input variable (fueling) for 2 
unseen data sets. The color code indicates model prediction - green (and red) indicate stable (and unstable) prediction by the 
model. The dotted line in the IMEP plot indicates misfire limit, dotted ellipse in CA50 plot indicates high variability instability 
mode while dotted rectangle indicates a wrong predictions by model. 


Although the SG-ELM appears to perform well in the 
HCCI identification problem, a comprehensive analysis and 
evaluation on several benchmark data sets is required and 
will be considered for future. Erom an application perspective, 
interesting areas for exploration include implementing the 
algorithm in real-time hardware, exploring a wide operating 
range of HCCI operation and development of controllers. 
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